Skip to main content

Filtering

We often only want to look at specific parts of the data. To be able to do so we can filter the data in a few ways.

Masks

We can use conditions which will return a Series of whether the column matches or doesn"t match the condition. This boolean Series, a so called filter mask can then be applied to the dataframe to only have the rows that are True (or False if we want that).

import pandas as pd

df_cars = pd.read_csv("data/cars.csv", sep=";", skiprows=[1])
mask = df_cars["Cylinders"] == "4"
mask
0      False
1 False
2 False
3 False
4 False
...
401 False
402 False
403 False
404 False
405 False
Name: Cylinders, Length: 406, dtype: bool
df_cars[mask]
Car MPG Cylinders Displacement Horsepower Weight Acceleration Model Origin
df_cars[df_cars["Cylinders"] == "4"]  # all at once
Car MPG Cylinders Displacement Horsepower Weight Acceleration Model Origin

Conditions can also be combined with & as and, and | as or.

df_cars[(df_cars["Cylinders"] == "4") & (df_cars["MPG"] == "24.0")]
Car MPG Cylinders Displacement Horsepower Weight Acceleration Model Origin
df_cars[(df_cars["MPG"] == "4") | (df_cars["MPG"] == "24.0")]
Car MPG Cylinders Displacement Horsepower Weight Acceleration Model Origin

Isin

We can check to see if certain values are in a list.

mpgs = ["24.0", "25.0", "30.0"]
df_cars[df_cars["MPG"].isin(mpgs)]
Car MPG Cylinders Displacement Horsepower Weight Acceleration Model Origin

Str accessor

The functions and methods under the str accessor provide flexible ways to filter rows based on strings.

df_cars[df_cars["Car"].str.startswith("Toyota")]
Car MPG Cylinders Displacement Horsepower Weight Acceleration Model Origin
20 Toyota Corolla Mark ii 24.0 4 113.0 95.0 2372.0 15.0 70 Japan
37 Toyota Corolla 25.0 4 113.0 95.0 2228.0 14.0 71 Japan
60 Toyota Corolla 1200 31.0 4 71.0 65.0 1773.0 19.0 71 Japan
64 Toyota Corolla Hardtop 24.0 4 113.0 95.0 2278.0 15.5 72 Japan
89 Toyota Corolla Mark II (sw) 23.0 4 120.0 97.0 2506.0 14.5 72 Japan
91 Toyota Corolla 1600 (sw) 27.0 4 97.0 88.0 2100.0 16.5 72 Japan
115 Toyota Camry 20.0 4 97.0 88.0 2279.0 19.0 73 Japan
130 Toyota Mark II 20.0 6 156.0 122.0 2807.0 13.5 73 Japan
138 Toyota Corolla 1200 32.0 4 71.0 65.0 1836.0 21.0 74 Japan
151 Toyota Corolla 31.0 4 76.0 52.0 1649.0 16.5 74 Japan
174 Toyota Corolla 29.0 4 97.0 75.0 2171.0 16.0 75 Japan
178 Toyota Corolla 24.0 4 134.0 96.0 2702.0 13.5 75 Japan
212 Toyota Corolla 28.0 4 97.0 75.0 2155.0 16.4 76 Japan
217 Toyota Mark II 19.0 6 156.0 108.0 2930.0 15.5 76 Japan
242 Toyota Corolla Liftback 26.0 4 97.0 75.0 2265.0 18.2 77 Japan
274 Toyota Corolla 27.5 4 134.0 95.0 2560.0 14.2 78 Japan
277 Toyota Celica GT Liftback 21.1 4 134.0 95.0 2515.0 14.8 78 Japan
317 Toyota Corolla Tercel 38.1 4 89.0 60.0 1968.0 18.8 80 Japan
325 Toyota Corolla Liftback 29.8 4 134.0 90.0 2711.0 15.5 80 Japan
328 Toyota Corolla 32.2 4 108.0 75.0 2265.0 15.2 80 Japan
350 Toyota Starlet 39.1 4 79.0 58.0 1755.0 16.9 81 Japan
355 Toyota Tercel 37.7 4 89.0 62.0 2050.0 17.3 81 Japan
363 Toyota Corolla 32.4 4 108.0 75.0 2350.0 16.8 81 Japan
369 Toyota Cressida 25.4 6 168.0 116.0 2900.0 12.6 81 Japan
390 Toyota Corolla 34.0 4 108.0 70.0 2245.0 16.9 82 Japan
398 Toyota Celica GT 32.0 4 144.0 96.0 2665.0 13.9 82 Japan
df_cars[df_cars["Car"].str.contains("Corolla")]
Car MPG Cylinders Displacement Horsepower Weight Acceleration Model Origin
20 Toyota Corolla Mark ii 24.0 4 113.0 95.0 2372.0 15.0 70 Japan
37 Toyota Corolla 25.0 4 113.0 95.0 2228.0 14.0 71 Japan
60 Toyota Corolla 1200 31.0 4 71.0 65.0 1773.0 19.0 71 Japan
64 Toyota Corolla Hardtop 24.0 4 113.0 95.0 2278.0 15.5 72 Japan
89 Toyota Corolla Mark II (sw) 23.0 4 120.0 97.0 2506.0 14.5 72 Japan
91 Toyota Corolla 1600 (sw) 27.0 4 97.0 88.0 2100.0 16.5 72 Japan
138 Toyota Corolla 1200 32.0 4 71.0 65.0 1836.0 21.0 74 Japan
151 Toyota Corolla 31.0 4 76.0 52.0 1649.0 16.5 74 Japan
174 Toyota Corolla 29.0 4 97.0 75.0 2171.0 16.0 75 Japan
178 Toyota Corolla 24.0 4 134.0 96.0 2702.0 13.5 75 Japan
212 Toyota Corolla 28.0 4 97.0 75.0 2155.0 16.4 76 Japan
242 Toyota Corolla Liftback 26.0 4 97.0 75.0 2265.0 18.2 77 Japan
274 Toyota Corolla 27.5 4 134.0 95.0 2560.0 14.2 78 Japan
317 Toyota Corolla Tercel 38.1 4 89.0 60.0 1968.0 18.8 80 Japan
325 Toyota Corolla Liftback 29.8 4 134.0 90.0 2711.0 15.5 80 Japan
328 Toyota Corolla 32.2 4 108.0 75.0 2265.0 15.2 80 Japan
363 Toyota Corolla 32.4 4 108.0 75.0 2350.0 16.8 81 Japan
390 Toyota Corolla 34.0 4 108.0 70.0 2245.0 16.9 82 Japan

Tilde

The tilde operator is used for “not” logic in filtering. If we add the tilde operator before the filter expression, the rows that do not fit the condition are returned.

df_cars[~df_cars["Car"].str.contains("Corolla")]
Car MPG Cylinders Displacement Horsepower Weight Acceleration Model Origin
0 Chevrolet Chevelle Malibu 18.0 8 307.0 130.0 3504.0 12.0 70 US
1 Buick Skylark 320 15.0 8 350.0 165.0 3693.0 11.5 70 US
2 Plymouth Satellite 18.0 8 318.0 150.0 3436.0 11.0 70 US
3 AMC Rebel SST 16.0 8 304.0 150.0 3433.0 12.0 70 US
4 Ford Torino 17.0 8 302.0 140.0 3449.0 10.5 70 US
... ... ... ... ... ... ... ... ... ...
401 Ford Mustang GL 27.0 4 140.0 86.0 2790.0 15.6 82 US
402 Volkswagen Pickup 44.0 4 97.0 52.0 2130.0 24.6 82 Europe
403 Dodge Rampage 32.0 4 135.0 84.0 2295.0 11.6 82 US
404 Ford Ranger 28.0 4 120.0 79.0 2625.0 18.6 82 US
405 Chevy S-10 31.0 4 119.0 82.0 2720.0 19.4 82 US

388 rows × 9 columns

df_cars[~df_cars["MPG"].isin(mpgs)]
Car MPG Cylinders Displacement Horsepower Weight Acceleration Model Origin
0 Chevrolet Chevelle Malibu 18.0 8 307.0 130.0 3504.0 12.0 70 US
1 Buick Skylark 320 15.0 8 350.0 165.0 3693.0 11.5 70 US
2 Plymouth Satellite 18.0 8 318.0 150.0 3436.0 11.0 70 US
3 AMC Rebel SST 16.0 8 304.0 150.0 3433.0 12.0 70 US
4 Ford Torino 17.0 8 302.0 140.0 3449.0 10.5 70 US
... ... ... ... ... ... ... ... ... ...
401 Ford Mustang GL 27.0 4 140.0 86.0 2790.0 15.6 82 US
402 Volkswagen Pickup 44.0 4 97.0 52.0 2130.0 24.6 82 Europe
403 Dodge Rampage 32.0 4 135.0 84.0 2295.0 11.6 82 US
404 Ford Ranger 28.0 4 120.0 79.0 2625.0 18.6 82 US
405 Chevy S-10 31.0 4 119.0 82.0 2720.0 19.4 82 US

406 rows × 9 columns